About 5139 letters

About 26 minutes

#Introduction to Regular Expressions

A Regular Expression (Regex) is a powerful tool for matching and processing text. It defines a search pattern using a specific syntax within a string.

For example, verifying whether an input email address is valid character by character is tedious. Instead, a regular expression like:

^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$

can be used to validate it.

import re # Validate email format email_pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$' if re.match(email_pattern, "user@example.com"): print("Valid email")

#Metacharacters

MetacharacterMeaningExample
.Matches any single character (except newline)a.cabc, a1c
^Matches the beginning of the string^abc → matches abcxxxx
$Matches the end of the stringabc$ → matches xxxxabc
*Matches 0 or more repetitions of the preceding charactera*"", a, aa
+Matches 1 or more repetitionsa+a, aa
?Matches 0 or 1 repetitiona?"", a
{n}Matches exactly n repetitionsa{2}aa
{min,}Matches at least min repetitionsa{2,}aa, aaa, aaaa
{min,max}Matches between min and max repetitionsa{2,3}aa, aaa
[]Matches any one character inside the brackets[abc]a, b, c
[^]Matches any one character not in the brackets[^abc]d, e, f
[-]Indicates a range[a-z]a, b, ..., z
()Groups expressions(abc)+abc, abcabc
|OR operatorabc|xyzabc or xyz
\dMatches any digit, same as [0-9]\d1, 2, 3
\DMatches any non-digit, same as [^0-9]\Da, @, _
\wMatches alphanumeric or underscore, [a-zA-Z0-9_]\wa, 1, _
\WMatches non-word characters, [^a-zA-Z0-9_]\W@, #
\sMatches any whitespace character\s → space, \t, \n, etc.
\SMatches any non-whitespace character\Sa, 1, @
\bMatches word boundaries\bcat\b → matches cat in a sentence
\BMatches non-word boundaries\Bcat\B → matches cat in scatter
\rCarriage return
\nNewline
\fForm feed
\tTab
\vVertical tab
\Escape character to treat special characters literally\++

#Greedy vs Lazy Matching

By default, regex uses greedy matching, which means it tries to match the longest possible string. If a ? is added, it switches to lazy (non-greedy) matching, which matches the shortest possible string.

Greedy PatternDescriptionLazy PatternDescription
.*Match 0 or more, longest possible.*?Match 0 or more, shortest
.+Match 1 or more, longest possible.+?Match 1 or more, shortest
.?Match 0 or 1, longest.??Match 0 or 1, shortest
.{n,m}Match n to m times, longest.{n,m}?Match n to m times, shortest
.{n,}Match at least n, longest.{n,}?Match at least n, shortest

Created in 5/15/2025

Updated in 5/15/2025